Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integration: added HaystackAction #398

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

zilto
Copy link
Collaborator

@zilto zilto commented Oct 16, 2024

Add HaystackAction which allows to define a Burr Action from a Haystack Component.

Changes

  • HaystackAction class
  • haystack_pipeline_to_burr_graph() function
  • added a tutorial notebook detailing some motivation and illustrating code with and without the integration

How I tested this

  • no test yet

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

Important

Add HaystackAction and conversion function to integrate Haystack components and pipelines with Burr.

  • Integration:
    • Add HaystackAction class in haystack.py to convert Haystack Component to Burr Action.
    • Add haystack_pipeline_to_burr_graph() function in haystack.py to convert Haystack Pipeline to Burr Graph.
  • Documentation:
    • Add haystack.rst for integration reference.
    • Add tutorial in examples/haystack-integration/README.md.
  • Examples:
    • Add application.py demonstrating integration usage.
  • Tests:
    • Add test_burr_haystack.py to test HaystackAction and pipeline conversion.
  • Dependencies:
    • Add haystack to pyproject.toml under optional dependencies.

This description was created by Ellipsis for f9e171b. It will automatically update as commits are pushed.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 660a268 in 32 seconds

More details
  • Looked at 215 lines of code in 2 files
  • Skipped 2 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_71RMLCbEWlBw8VfO


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

burr/integrations/haystack.py Outdated Show resolved Hide resolved
burr/integrations/haystack.py Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Oct 16, 2024

A preview of f9e171b is uploaded and can be seen here:

https://burr.dagworks.io/pull/398

Changes may take a few minutes to propagate. Since this is a preview of production, content with draft: true will not be rendered. The source is here: https://github.com/DAGWorks-Inc/burr/tree/gh-pages/pull/398/

ellipsis-dev[bot]

This comment was marked as outdated.

ellipsis-dev[bot]

This comment was marked as outdated.

ellipsis-dev[bot]

This comment was marked as duplicate.

Copy link
Contributor

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

  1. Let's mark the pipeline converter as experimental
  2. Let's do tests with "custom" haystack components -- should be easy for tests
  3. Add to refs/docs
    https://docs.haystack.deepset.ai/docs/custom-components

burr/integrations/haystack.py Show resolved Hide resolved
self._component = component
self._name = name
self._reads = list(reads.keys()) if isinstance(reads, dict) else reads
self._writes = list(writes.values()) if isinstance(writes, dict) else writes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably validate that writes doesn't have a duplicate mapping, given that it's the values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed some mappings to ensure reads={socket: state_field} and writes={state_field: socket}.

This seems to make the most sense because:

  • on reads, each key should be unique and map to a unique kwarg of Component.run(). The same state value could be passed to different kwargs
  • on writes, each key should be a unique state field. Even though Component.run() returns a dictionary where keys are unique, we want to prevent calling .update() on the same field several times.

In other words, this would be invalid:

class CustomComponent:
  def run() -> dict:
     return {"bar": 1, "baz": 2}

# mapping both `bar` and `baz` to `State["foo"]` is invalid
HaystackAction(
   component=CustomComponent()
   writes={"bar": "foo", "baz": "foo"}
)

# reverting the mapping ensures that two outputs can't be mapped to `foo`
HaystackAction(
   component=CustomComponent()
   writes={"foo1": "bar", "foo2": "baz"}
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that makes sense, just make sure it's thoroughly documented

return self._writes

@property
def inputs(self) -> tuple[dict[str, str], dict[str, str]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably assign this in the constructor? Break it into a helper function.

Copy link
Collaborator Author

@zilto zilto Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding .inputs(), it is currently a property, but it's return value may change if we allow a .bind() method. What was previously a required/optional input is now bound.

If .bound() is removed, then yes we could set values in __init__()

It seems that this logic belongs to .inputs() and wouldn't be of much use elsewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, .bound() should return a copy of the object so the .inputs() are frozen IMO. That said, this should take in bound_inputs in the constructor, and we don't need a method? You don't really want properties dynamically computed like this, it's hard to debug and a bit iffy IMO.


return state.update(**state_update)

def bind(self, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So bind is only (currently) for functions, actions don't have it. Might be a bit confusing here, but if we like it, it should really be at the Action level, as that makes a lot of sense IMO. Then it could just do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise we can make this a more specific name to not conflict

Copy link
Collaborator Author

@zilto zilto Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand, the pattern is to only allow bound_params at init? This affects the Component.run() method

This would be ok:

haystack_retrieve_documents = HaystackAction(
    component=InMemoryEmbeddingRetriever(InMemoryDocumentStore()),
    name="retrieve_documents",
    reads=["query_embedding"],
    writes=["documents"],
    bound_params={"foo": "bar"},
)

This would not be ok?

haystack_retrieve_documents = HaystackAction(
    component=InMemoryEmbeddingRetriever(InMemoryDocumentStore()),
    name="retrieve_documents",
    reads=["query_embedding"],
    writes=["documents"],
)
haystack_retrieve_documents.bind(foo="bar")

I should simply remove the .bind() method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, or call it something else? Alternatively, we can make bind work at the action level. Currently it's only for functions, and class-based actions don't have it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yeah, the pattern should be a copy-operation here, or remove this and just do the inputs in the parameters.

burr/integrations/haystack.py Show resolved Hide resolved
burr/integrations/haystack.py Outdated Show resolved Hide resolved
ellipsis-dev[bot]

This comment was marked as resolved.

ellipsis-dev[bot]

This comment was marked as outdated.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 82ee637 in 15 seconds

More details
  • Looked at 13 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. pyproject.toml:65
  • Draft comment:
    Add 'haystack' to the '[project.optional-dependencies]' section to ensure it is recognized as an optional dependency.
  • Reason this comment was not posted:
    Comment did not seem useful.

Workflow ID: wflow_DEHzhJ627MrB9dwi


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on f9e171b in 26 seconds

More details
  • Looked at 26 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. burr/integrations/haystack.py:191
  • Draft comment:
    Remove the leftover print statement for cleaner code.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
2. pyproject.toml:118
  • Draft comment:
    Ensure that haystack-ai is the correct package name for the intended dependency.
  • Reason this comment was not posted:
    Comment did not seem useful.

Workflow ID: wflow_kEbWbKwmrel9XTj4


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One quick point about bind() doing a mutation, otherwise looks good

raise ValueError(
f"Socket `{output_socket_name}` missing from output of `Component.run()`"
) from e
print(state_update)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

printline

self._component = component
self._name = name
self._reads = list(reads.keys()) if isinstance(reads, dict) else reads
self._writes = list(writes.values()) if isinstance(writes, dict) else writes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that makes sense, just make sure it's thoroughly documented


return state.update(**state_update)

def bind(self, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yeah, the pattern should be a copy-operation here, or remove this and just do the inputs in the parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants